Translocation of structured polynucleotides through nanopores 
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We investigate theoretically the translocation of structured RNA/DNA molecules through narrow 
pores which allow single but not double strands to pass. The unzipping of basepaired regions within 
the molecules presents significant kinetic barriers for the translocation process. We show that 
this circumstance may be exploited to determine the full basepairing pattern of polynucleotides, 
including RNA pseudoknots. The crucial requirement is that the translocation dynamics (i.e., the 
length of the translocated molecular segment) needs to be recorded as a function of time with a 
spatial resolution of a few nucleotides. This could be achieved, for instance, by applying a mechanical 
driving force for translocation and recording force-extension curves (FEC's) with a device such as 
an atomic force microscope or optical tweezers. Our analysis suggests that with this added spatial 
resolution, nanopores could be transformed into a powerful experimental tool to study the folding 
of nucleic acids. 



A series of recent experiments studied the translo- 
cation of DNA and RNA molecules through narrow 
pores, which allow single but not double strands to pass 
HESSHIS00> see Ref. for a review. These 
investigations pursued two main goals: (i) to probe in 
a well-defined model system the physics of biopolymer 
translocation across membranes, a process which is ubiq- 
uitous in cell biology, and (ii) to explore the potential of 
nanopores as a single-molecule tool. In the experiments 
so far, a membrane protein, a-hemolysin, was used as the 
pore. An electric field acting on the negatively charged 
DNA/RNA backbone drives the molecules through the 
pore, and translocation is monitored by measuring the 
induced ionic current, which is strongly reduced while a 
DNA/RNA chain blocks the pore. Until very recently 
d It, the experiments have focused on the transloca- 
tion of unstructured, mostly homopolymeric molecules, 
a problem which has also received considerable theoreti- 
cal interest 

ElElElElElElElElEi For such 
unstructured molecules, the main results regarding the 
above goals were that (i) the basic physics of transloca- 
tion is adequately described by a drift-diffusion process, 
in which monomers hop randomly in and out of the pore 
with a directional bias due to the applied voltage E3 , and 
(ii) nanopores could possibly be developed into rapid se- 
quencing devices, since the ionic current during blockage 
displays a weak sequence-dependence [1 Q . 

In contrast, for structured polynucleotides, both the 
basic physics and the potential applications of translo- 
cation still remain largely unexplored. Experimentally, 
important first steps have been taken by studying the 
translocation of simple hairpin (i.e., stem- loop) struc- 



tures || and the unzipping of double-stranded DNA 
through a nanopore However, a general theoretical 
framework to describe translocation of these as well as 
more complex RNA/DNA structures is currently lacking. 
Here, we first construct such a framework and then use 
it to investigate the potential of nanopores as a single- 
molecule tool for the study of biopolymer folding. 

In this article, we are interested in the generic physi- 
cal aspects of the translocation process that neither de- 
pend on the specific properties of a particular protein 
pore, nor on the detailed way in which the driving force 
for translocation is applied. As in p revious theoretical 
studies [IS El El [ll El El Ell, we use a coarse- 
grained model which treats the pore basically as a sep- 
arator between a cis and a trans part of the molecule 
with a characteristic friction coefficient, see the sketch 
in Fig. ^ Presumably this description will apply di- 
rectly to solid-state nanopores |19l |2fJ , which can now 
be fabricated with sizes down to ~ 2 nm, not much 
larger than the ^1.5 nm aperture of the a-hemolysin 
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FIG. 1: Sketch of a structured polynucleotide that is driven 
across a nanopore which allows single but not double strands 
to pass. Here, the driving force causing translocation from 
the cis to the trans side is exerted by an electric field that 
acts on the negatively charged backbone of the molecule. 
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pore and slightly smaller than the ^2.2 nm diameter of 
double-stranded DNA or stems in RNA. Also, we do not 
consider the full three-dimensional (tertiary) structure of 
the molecules, but focus on the basepairing pattern, i.e. 
the secondary structure including possible pseudoknots, 
which are the only structural features present when there 
are no divalent metal ions in the solution. Unless stated 
otherwise, the term 'structure' refers here to this base- 
pairing pattern. While both our theoretical framework 
and our conclusions apply equally to RNA and single- 
stranded DNA, the RNA case is particularly interesting, 
since structured RNA's have a multitude of functions in 
molecular bi ology a nd RNA folding is an active field of 
research El Eall . 



General theoretical framework 

Fig-ffldepicts schematically the driven translocation of 
a structured polynucleotide from the cis to the trans side 
of the pore. We seek here a convenient reduced descrip- 
tion of this translocation process, rather than modeling 
the full three-dimensional polymer dynamics explicitly. 
Our approach is similar in spirit to the exist ing: models 
for the case of unstructured polymers 0, E3, 0] , where 
the translocation dynamics is formulated in terms of a 
single variable, e.g. the number of nucleotides, to, on 
the cis side, see Fig. ^ The dynamics, m(t), is stochas- 
tic and can be described by 'hopping rates', fc_(rn) and 
/c+(to), for forward and backward motion of the nu- 
cleotide chain through the pore with a stepsize of one 
monomer. The external force on the molecule leads to 
an imbalance in the hopping rates, k-(m) > k + (m), and 
hence a mean drift towards the trans side. For unstruc- 
tured molecules the one-dimensional description is per- 
missible, if the relaxation of the polymer degrees of free- 
dom on both sides of the pore is faster than the hopping 
process. This assumption does not hold for arbitrarily 
long polymers, since the relaxation time increases with 
the polymer length [l3L Hif , however for lengths on the 
order of a thousand bases, the one-dimensional descrip- 
tion is adequate under typical experimental conditions 
[l^j l . The residual effect of the polymer ends is then only 
to introduce an entropic barrier for translocation, which 
leads to a weak m-dependence of the hopping rates. 

For structured molecules, the translocation dynamics 
is considerably more complicated, since the dynamics of 
the 'reaction coordinate', m(t), is then coupled to the 
dynamics of the basepairing patterns on both sides: the 
structure on the cis side, S c i s (t), affects the forward rate, 
while the structure on the trans side, St ra ns{t), affects the 
backward rate, 



k-(m,S ai s(t)) 1 
TO TO — 1 



In two limiting cases however, the process can be mod- 
eled by a one-dimensional Brownian walk as for unstruc- 
tured molecules, but with a complex sequence/structure- 
dependent free energy landscape T{m) along the coordi- 
nate to: (A) If the dynamics of the basepairing patterns 
S c is (t) and Straus (t) is much faster than the hopping pro- 
cess, the landscape is determined by the ensemble free en- 
ergy of all basepairing patterns on the cis and trans side. 
(B) In the opposite limit, the basepairing pattern on the 
cis side is essentially frozen and is unzipped basepair by 
basepair as it is driven through the pore. The landscape 
is then determined by the basepairing energetics of the 
particular molecular structure prior to translocation, see 
below. In both cases, the free energy naturally decom- 
poses into three parts, 

F(m) = T cis {m) + J" trons (m) + F cxt {m) , (2) 

where !F C i S (rn) and Ttransijn) denote the intrinsic bind- 
ing free energies of the cis and trans parts of the molecule, 
while J 7 ext(TO) describes the effect of the external force. 
Given T(m), the simplest form for the hopping rates 
k±(m) which satisfies the detailed balance condition 
fc+(m)/fc_(m+l) = e -/3[^(™+i)-^(™)] ( w ith p = l/k B T) 



fc±(m) = fc e ^-ma X {^( m± l)-^M,0} 



(3) 



TO TO + 1 



(1) 



Here, ko denotes a microscopic rate constant, which can 
in principle be tuned by adjusting the properties of the 
pore. It can be interpreted as a friction coefficient and 
corresponds approximately to the bare hopping rate for 
unstructured molecules at zero external force (typical ex- 
erimental estimates for ko are on the order of 10 5 s _1 
). The dynamics of the translocation process, as de- 
scribed by Eqs. (0) and J2J is dominated by energetic bar- 
riers due to basepairing, whereas the above-mentioned 
entropic barrier is completely negligible for structured 
molecules. These energetic barriers lead to arrests dur- 
ing translocation, as clearly observed already in the ex- 
periments with simple hairpins || and double-stranded 
DNA 1. 



Pulling through a pore 

Qualitative aspects. We now make use of the theoreti- 
cal framework constructed above to investigate which in- 
formation on structured molecules could be derived from 
pore translocation experiments. To this end, it is useful 
to compare unzipping by driven translocation through 
a nanopore with the more conventional way of unzip- 
ping by appl ying a force on the ends of a biopolymer, 
see e.g. [25L lifiTE^. As illustrated in Fig. the two 
approaches differ fundamentally: Pulling on the ends in- 
duces a spontaneous unfolding order for the individual 
structural elements, which is a function of their relative 
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FIG. 2: Unzipping a structured molecule by pulling on its 
ends is fundamentally different from unzipping by driven 
translocation through a narrow pore, (a) For pulling on the 
ends, the stems (i.e., contiguously basepaired segments) in 
the molecule unfold in an order determined by their relative 
stability and the topology of the structure (a possible order 
1-9 is indicated), (b) In contrast, the pore forces the stems to 
unfold in a linear order along the sequence, as again indicated 
by the numbering 1-9. 



stabilities and the topology of the structure. In con- 
trast, the nanopore prescribes a linear order along the 
sequence, and unfolds an RNA molecule much as en- 
zymes such as the ribosome do in cells. This difference 
suggests that the two approaches can also yield differ- 
ent types of information about the molecule under study. 
As demonstrated by Onoa et al. j^, clever use of the 
pulling on the ends approach can reveal detailed infor- 
mation on the (un)folding pathway of an RNA molecule 
with known structure. However, when the structure of 
an RNA molecule is unknown, pulling on the ends can 
provide, by itself, little information beyond a count of 
the number of structural elements that unfold separately 
[27l Eif . In the following we therefore focus on the ques- 
tion of how much structural information may in principle 
be obtained with the nanopore approach. 

Let us suppose that we were able to observe the tra- 
jectories m(t) of the molecules during the translocation 
process. We could then assign a position within the se- 
quence to each arrest during translocation. Since an ar- 
rest is caused by a kinetic barrier, i.e. a stem trapped 
at the entrance to the pore, we could thereby identify 
the positions of the stems in the structure. Such infor- 
mation can indeed be sufficient to reconstruct almost the 
entire basepairing pattern of a molecule, as we demon- 
strate explicitly using an example below. If the translo- 
cation dynamics is in the strongly driven limit (B) where 
the structure on the cis side is essentially frozen, then 
the reconstructed structure would correspond to the ini- 
tial structure of the molecule before translocation. We 
concentrate on this limit in the following, including a 
discussion of its attainability. However, it may be note- 
worthy that in the slow translocation limit (A) one would 
also obtain useful structural information, namely on the 
average structure of the molecule (with respect to the 
thermodynamic ensemble of all structures (22i)- As long 
as the molecule is 'well-designed' this average structure 
will be dominated by the ground-state, i.e. the minimum 



binding free energy structure 1 . 

How could one possibly observe the trajectories m(t) 
during translocation? For the purpose of structure de- 
termination, we will need m(t) with a spatial resolution 
below the typical length of a stem in an RNA struc- 
ture (5-10 basepairs). This may be achievable through 
a refinement of the current nanopore technology, such 
that careful analysis of the ionic current allows a count 
(or even sequencing) of the bases that have passed the 
pore 0, 0| ■ With artificial solid-state pores 0, it is 
also conceivable to use a tunneling current through leads 
within the membrane as a probe to count (or sequence) 
the bases as they pass through the pore. Here, we explore 
yet another option, namely pulling the molecule mechan- 
ically through the pore, with a device that can record 
force-extension curves, e.g. an atomic force microscope 
or optical tweezers. The explicit discussion of this case 
with an exemplary RNA sequence serves us to gauge the 
more general capability of nanopores as single-molecule 
tools for the study of biopolymer folding. 

Quantitative aspects. Mechanical unfolding of a 
biopolymer yields characteristic sawtooth-shaped signa- 
tures in the force-extension curve (FEC) indicating the 
opening of structural elements within the molecule, see 
e.g. j25L |27| . From the relative positions of these 
sawteeth one can determine length changes within the 
molecule with an extremely high resolution of about 
1 nm. In the usual setup where the molecule is unfolded 
by pulling on its ends, such length changes can only be 
used to infer the 'stored length' of a structural element, 
but not its precise position along the backbone of the 
molecule, cf. Fig. [2J In contrast, for mechanical pulling 
through a pore, the relative positions of the resulting 
sawteeth will correspond directly to the relative posi- 
tions of the structural elements in the sequence 2 . One 
conceivable way to prepare the initial condition where 
an RNA molecule is almost entirely on the cis side, with 
one end threaded through the pore and attached to a 
pulling device on the trans side, is to start with an at- 
tached molecule on the trans side and to apply a voltage 
pulse across the pore that suffices to drive the molecule 
as far as possible to the cis side. 



The worst case for the purpose of structure determination corre- 
sponds to the regime where the typical timescale for the translo- 
cation of say a single hairpin is comparable to the timescale for 
structural rearrangements involving the formation of new stems: 
in this case, the structure on the cis side may relax after a stem 
is unzipped, so that one would oberve only the signatures of the 
relaxed structure rather than the original structure. This regime 
should be avoided by a proper choice of the driving force and the 
friction coefficient of the pore (R. Bundschuh and U. Gcrland, to 
be published). 

2 The absolute position can be inferred by adding a known struc- 
tural element, e.g. a strong C-G hairpin, to one end of the RNA, 
which can then function as a reference point. 
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To apply our general model to the particular case of 
mechanical pulling in the strongly driven (fast pulling) 
limit, we need to specify the form of the three terms in 
the free energy landscape J2J. The second term, i.e. the 
binding free energy on the trans side, may be set to zero, 

Ftrans{m) = , (4) 

since the reformation of structure after translocation is 
suppressed at high tensions in the RNA single strand 3 . 
The third term, J- eyL t(jn), describes the effect of the me- 
chanical stress on the RNA, which stretches the single- 
stranded trans part of the molecule. The clastic response 
of this single-strand may be modeled by a freely jointed 
chain (FJC) polymer model. Assuming for simplicity a 
constant pulling speed v, the third term then takes the 
form 

^ext(m) = ^FJC+spring(w ■ t; N — m) . (5) 

Here, the function -7-Fjc+spring(-Rt; n) denotes the com- 
bined free energy of a single-stranded RNA of n bases 
in series with a linear spring, stretched to a total ex- 
tension R t — v ■ t '28\. (The linear spring takes into 
account the stiffness of the force-measuring device, see 
the Appendix for details.) By assumption, the first term, 
Fds (m) , represents the binding free energy of the remain- 
ing part of the initial structure on the cis side. T C i S {m) 
can be calculated for any initial structure, based on the 
free energy rules for RNA secondary structure [13] with 
a natural extension for pseudoknotted structures, see the 
Appendix. Our assumption of a frozen structure on the 
cis side is most likely an oversimplification for realistic 
pulling speeds, since small fluctuations in the secondary 
structure are known to occur already on timescales on 
the order of tens of microseconds |31| . However, since 
the pore pulling approach is sensitive only to stem po- 
sitions, we expect that it is unaffected by small fluctua- 
tions and sensitive only to major rearrangements which 
significantly change the secondary structure. Such re- 
arrangements are typically slow, sometimes even on the 
timescale of hours jU . 



Reconstruction of secondary structures. To illus- 
trate the problem and the method, we use an exemplary 
RNA, the well-studied self-splicing intron of Tetrahymena 
thermophila |2lJ with a sequence of 419 bases (Genbank 
# V01416). In its correctly folded active state, the 



3 For instance, Liphardt et al. l2Cl observed refolding rates for a 
single hairpin around 1 s — 1 at the unfolding force fi/2 ~ 14 pN. 
At a pulling speed of say 1 /im/s, the translocation of an RNA 
molecule with a thousand bases would therefore be terminated 
before refolding of a structural element on the trans side occurs. 



basepairing pattern of this ribozyme contains a pseu- 
doknot (see Fig. |3K), while its best characterized long- 
lived folding intermediate |U 13 nas a known alter- 
native structure without pseudoknot (see Fig. 12b) [13 • 
We will investigate whether one can in principle use the 
pulling-through-a-pore approach not only to discriminate 
between these two different conformations in individual 
molecules, but also to reconstruct both structures from 
the FEC's. 

To obtain FEC's for these structures, we performed 
Monte-Carlo simulations of the stochastic process defined 
by Eqs. ^3), and used Eq. © from the Appendix to 
calculate the force and extension time traces. We per- 
formed all calculations at the same pulling speed (v — 
0.1 nm/time step, which roughly corresponds to 10 /im/s 
given typical values for fc , see above), and the same stiff- 
ness of the force-measuring device (A = 0.5 pN/nm). 
Fig. 0] displays three such FEC's (solid lines) for the 
non-pseudoknotted structure of Fig. 0t>, corresponding 
to unzipping from the 3' end. These FEC's show the 
sawtooth-like behavior which is characteristic for the se- 
quential opening of structural elements (a very similar 
behavior was observed in the experiments of Onoa et al. 
[2?| where the molecule was rapidly unzipped by pulling 
on its ends). The rising parts of the sawteeth correspond 
to stretching of single strand on the trans side as a stacked 
region is "trapped" in front of the pore on the cis side. 
When a stacked region opens, some single strand is freed 
to pass the pore, which leads to relaxation of the ten- 
sion and causes the downstrokes in the FEC's. Note that 
the FEC's do not share all of their sawteeth, which re- 
flects the importance of thermal fluctuations for this type 
of single molecule experiments (this property is manifest 
also in the experiment of Onoa et al. 27]). 

The most relevant information contained in the FEC's 
are the positions of the translocation arrests, during 
which the required force for the opening of basepairs is 
built up. To extract these positions, we use FEC's of 
freely jointed chains with different lengths: The dashed 
lines in Fig. 0] show some examples of such FEC's where 
the chain length n coincides with the length of the RNA 
single strand on the trans side during such an arrest. 
With an automated procedure described in the Appendix 
we obtain all of these positions (above a threshold for the 
duration of an arrest). 

Since the bases around the position of an arrest are 
very likely basepaired with another segment of the se- 
quence further to the 5' end, we represent this informa- 
tion by a closing angular bracket, '}', above that position 
in the RNA sequence (written from 5' to 3'), see Fig. [3] 
Of course, the molecule can also be pulled through the 
pore in the other direction, i.e. from the 5' end. This 
yields information on the positions of segments that have 
downstream binding partners. The same procedure then 
leads to the opening brackets, '(', also shown in Fig.|SJ 

Bracket representations are a widely used short hand 
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FIG. 3: Secondary structure of the Tetrahymena thermophila 
Group I intron: (a) Long-lived folding intermediate |3^|. (b) 
Native state with pseudoknot. The basepairs shown in green 
are correctly reconstructed from the force-extension curves, 
see Fig. using the procedure described in the main text, 
while the bases shown in red are involved in incorrect basepair 
predictions (the procedure yields no prediction for the bases 
shown in black); see also Fig. 



notation for RNA secondary structures. For the struc- 
tures in Fig. [2I such a representation is shown in the third 
row of Fig. [3 Note that two types of brackets have to be 
used for the pseudoknotted native structure, in order to 
make the association between opening and closing brack- 
ets unambiguous. We observe that the angular brackets 
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FIG. 4: Force-extension traces (solid lines) as obtained 
with our stochastic model for mechanical pulling through a 
nanopore. (a) and (b) each show three different runs with the 
same initial conditions (and pulling speed of v = 0.1 nm/time 
step) for the structures in Fig-lffia - ) and (b), respectively. The 
force (/) and extension (R) are calculated using Eq. © in the 
Appendix. The dashed lines are freely jointed chain FEC's 
whose lengths are fitted to some of the positions that corre- 
spond to translocation arrests. 



extracted from the FEC's can be viewed as an incomplete 
bracket representation of the RNA secondary structure. 
Can we complete it using only the given sequence of the 
RNA molecule? 

This task is a sequence alignment problem, which con- 
sists of matching each opening (closing) bracket with 
an associated downstream (upstream) binding sequence. 
Several circumstances conspire to make this, somewhat 
surprisingly, a nontrivial problem: (i) stems, i.e. contigu- 
ous basepaired regions, are usually short, typically 5-10 
basepairs, (ii) structural elements often lead to a different 
number of angular brackets in the two pulling directions, 
i.e. not every opening bracket has a corresponding clos- 
ing bracket and vice versa, and (iii) sequence segments 
containing several U's have many possible binding part- 
ners, since U's can pair with A's and G's. 

To overcome this problem, we developed a probabilis- 
tic sequence alignment algorithm (see Appendix), which 
identifies the most likely set of stems that is consistent 
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(A) Non-pseudoknotted intermediate state: 

c , - - < > < < > < 

5 -CUCUCUAAAUAGCAAUAUUUACCUUUGGAGGGAAAAGUUAUCAGGCAUGCACCUGGUAGCUAGUCUUUAA- 

(<(<(<(<(<( ) ) ))>))))))...((<(((<((( >))>)))>)).((((<((< 

ccccccccc ccccccccc. . . aaaaaaaaaa aaaaaaaaaa . gggggg . . 



— AC C AAUAGAUUGC AUC GGUUUAAAAGG C AAGAC C GUCAAAUUG CGGGAAAGGGGUC AAC AGC CGUUCAGU - 

(((.(( )).)))))■)))))) {( (((( ((({(( ({(■((((( 

gggggg jjjj kkkkkk .... iiii 



tainly be improved upon, e.g. by allowing for mismatches 
in longer stems, which should help to fill in many of the 
missed basepairs. Also, one could make use of the known 
basepairing energies in the reconstruction. 



— AC CAAGUCUC AGGGGAAACUUUGAGAUGGC CUU GC AAAG GGUAUGGUAAUAAGC UGAC GGAC AUGGUCC U — 

(((..(({({({({ )))))))))■-((( ) )) ) )) )■)))))))...))))))- 

1111 . . . 1111 iiii . . . kkkkkk . 



Discussion and Outlook 
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..)).))))((...((((...{((((((( )))))))).))))...))...(.{((({..((((. 
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— GUCGGGGAAGAUGUAUUC UUCUC AUAAGAUAUAGU C GGAC CUCUC C UUAAU GGGAGC UAGC GGAUGAAGU ~~ 

({({({( ))))))) ))))))))))((((■(({({ >)))><(<(<(< ({( 

. . . .eeeeeee eeeeeee hhhhh ffff ffffbbbbbbb 



-GAUGCAACACUGGAGCCGCUGGGAACUAAU 

( ) )) ) )))))))-((((((((■(((((( ))))))■))))))))..)) ) ) - 

bbbbbbb 



(B) Pseudoknotted native state: 



5 ' — CUCUCUAAAUAGC AAUAUUUAC CUUUG GAGG GAAAAGUU AU C AGGC AUGC AC C UGGU AGC U AGUC UUU AA — 

({({({({( >)>)>)>)>...<(<(<(<((< )))))))))).(((({((( 

ccccccccc ccccccccc. . .aaaaaaaaaa aaaaaaaaaa . gggggg . . 

> >..< <..< 

— AC C AAUAGAUUGC AUC GGUUUAAAAGG C AAG ACC GUCAAAUUG CGGGAAAGGGGUC AAC AGC C GUUC AGU — 

(((.(( )).))))).))))))..[[[[[■[ {((((( ((({(< (<(■<(((< 

gggggg. .fffff iiii 



~~ AC CAAGUC UC AGGGGAAAC U UUG AGAU GG C CUU GC AAAG GGUAUGGUAAUAAGC U GAC G GAC AUGGUCC U ~~ 

(((■■(((({((({ )))))))))■((( ) )) ) )) )■)))))))...)))))). 

jjjj - - ■ jjjj 



— AACC AC GC AGCC AAGUCC UAAGU C AAC AGAU CUUCUGUU GAUAUGGAUGC AGUUC AC AGACUAAAU GUC G — i 

-.)).)))){(...({({..-(((((((( )>))))))..))>)...))...(.<((((...] •]] 

iiii kkkkk. .dddddddd. . dddddddd hhhhh ff 

> <..< >..> >.< > < <. 

— GU C GGGGAAGAUGU AUUCUUCUCAUAAGAUAUAGUC G GAC CUCUC C UUAAU GGGAGC UAGC GGAUGAAGU I 
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FIG. 5: Reconstruction of the basepairing pattern from the 
FEC's. First row: parentheses extracted from the FEC's, 
which indicate the position of basepaired regions. Second 
row: RNA sequence. Third row: parentheses indicating the 
basepairs in the full structures shown in Fig. |21 Fourth row: 
stems predicted from the parentheses in the first row by se- 
quence alignment. See main text for details. 



with all angular brackets and where all paired sequence 
segments contain at least one angular bracket on each 
side. The output of this algorithm is shown in the fourth 
rows of Fig. El where lower case letters indicate paired se- 
quence segments and the alphabetic order represents the 
confidence level (confidence is largest for 'a'). In Fig. 
the bases involved in this reconstructed set of stems are 
colored, with green (red) indicating (in)correct basepair- 
ing. We observe that the two different basepairing pat- 
terns (for the same sequence) are clearly distinguished 
and the large scale secondary structure is captured in 
both cases. In particular, the pseudoknot in the native 
structure is correctly identified. The only incorrectly pre- 
dicted stem is the least significant one ('k') in the pseu- 
doknotted structure. 

While these results seem satisfactory as a proof of prin- 
ciple, we stress that our reconstruction algorithm can cer- 



Our theoretical study has led us to a simple coarse- 
grained model, Eqs. (I2I3I1 . for the translocation of struc- 
tured polynucleotides, which is applicable in the two op- 
posite limits of very slow and very rapid translocation. 
This model is a useful starting point for a more detailed 
description that remains valid in the entire parameter 
regime. Here, we have applied the model to demon- 
strate that the physics of the translocation process can 
in principle be exploited to use nanopores for secondary 
structure determination (including pseudoknots) on the 
single-molecule level. Indeed, the nanopore technique 
would be a useful addition to the existing repertoire of 
structure determination methods: RNA secondary struc- 
ture can be predicted computationally to some extent 
HH HE E3 based on experimentally determined free en- 
ergy rules however this approach is unreliable for 
RNA molecules exceeding ~ 100 bases and cannot take 
pseudoknots properly into account. Including pseudo- 
knots, which are often crucial to the function of RNA 
enzymes 13 2l l4l| . is not only computationally expen- 
sive 0, l43| - but is also limited by a lack of exper- 
imental information on the corresponding binding free 
energies. Experimentally, X-ray crystallography |35| or 
NMR .36] provide detailed structures, but these tech- 
niques are cumbersome and limited to small molecules 
or isolated domains of larger RNAs. Structural informa- 
tion for larger RNAs can currently only be obtained from 
comparative sequence analysis |38| . which requires large 
sets of homologous RNA sequences, or from indirect bio- 
chemical methods 32] . 

Throughout this paper, we have focused on basepairing 
only, which is permissible under ionic conditions that dis- 
favor tertiary interactions, e.g. low sodium and no mag- 
nesium. However, once the translocation of a molecule 
is well characterized under these conditions, it becomes 
interesting to switch to the native ionic conditions and 
examine the effects of tertiary interactions. Generally, 
one can expect more cooperativity in the presence of ter- 
tiary interactions, i.e. larger domains will open in a sin- 
gle step, as observed by Onoa et al. 27]. This suggests 
a hierarchical approach to structure determination with 
nanopores: first unzip under low ionic conditions to ob- 
tain the secondary structure, and then repeat in the pres- 
ence of magnesium to identify how the secondary struc- 
ture elements are grouped into larger tertiary structure 
domains (such as the P4-P6 domain in the Tetrahymena 
ribozyme). It is worthwhile to stress the advantage of 
RNA as a model system to separately study the effect 
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of secondary and tertiary structure. In contrast, the sec- 
ondary structure of proteins is not stable in the absence of 
tertiary structure, and hence one may expect that single- 
domain proteins will unfold and translocate across a pore 
in a single step. 

Nanopores could in principle also be used to probe 
the kinetics of large-scale secondary structure rearrange- 
ments in single- molecules. For instance, it would be use- 
ful to attach larger objects to both ends of a molecule 
that is already threaded through the pore, allowing the 
same molecule to be driven forth and back through the 
pore, over and over again. By varying the time inter- 
val between successive reversals of the driving force, one 
could then probe structural relaxation over a broad range 
of time scales. More generally, nanopores may emerge 
as a new tool to probe intra- and inter-molecular inter- 
actions in single biomolecules. For instance, one could 
probe the biophysics of combined binding and folding in 
the context of RNA-protein interactions. 
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Appendix 

Calculation of free energy landscape. Given a sec- 
ondary structure of the molecule, we obtain !F C i a (m) by 
eliminating all basepairs involving the terminal N — m 
bases, and calculating the binding free energy of the re- 
maining structure according to the free energy rules for 
RNA secondary structure |3pJ. [We take the free energy 
parameters as supplied with the Vienna RNA package 
(version 1.3.1) at room temperature T = 25°C. The salt 
concentrations at which these parameters were measured 
are [Na + ] = 1M and [Mg ++ ] = 0M.] For pseudoknotted 
structures, the free energy rules currently include no pre- 
scription, however the following extension appears rea- 
sonable: we first eliminate basepairs in stems that give 
rise to the pseudoknot(s) and calculate the free energy of 
the remaining structure according to the standard rules. 
We then add the free energies of the eliminated stems sep- 
arately, including the free energy for the loops created by 
these stems, again according to the standard free energy 
rules (however, the bases in these loops that are involved 
in other stems are removed before calculating the loop 
free energy). 

The trans part of the molecule is tethered at both ends, 
by the pore and the pulling device, respectively. The 
pulling device can be described by a linear spring, while 
the configurational entropy of the RNA single strand can 
be modeled by a freely jointed chain (FJC) with exten- 
sible segments. [For the few bases that are inside the 
pore, we neglect the effect of the confinement on the en- 



tropy] We denote by Rt the total extension of the trans 
part in series with the linear spring. The free energy (J3J) 
can then be expressed in terms of the total end-to-end 
distance distribution WFJC+springj 

•7"FJC+spring(-Rti Tl) — ~ fefiTlog T-^FJC+spring (-R* ; n ) ) 

which can in turn be written as the convolution of the 
individual end-to-end distance distributions of the FJC 
and the spring 

oo 

WFJC+spring (R t | n) = JdR W F jc (R; n) W sp ring (Rt - R) ■ 


Here, W spling (R s ) = exp(-/?Ai? s 2 /2)/ v^A, where 
A denotes the inherent stiffness of the pulling device. 
We calculate the end-to-end distance distribution of the 
freely jointed chain, Wfjc (R] n) , as described previously 
\?>j\ . The polymer parameters we use were obtained from 
a fit [H to FEC's of single-stranded DNA 44] (base- 
to-base length 0.7 nm, Kuhn length 1.9 nm, and stretch 
modulus 815 pN), since we are unaware of corresponding 
data for the chemically very similar RNA. 
Calculation of FEC's. We obtain several trajectories 
m(t) with a Monte Carlo simulation of Eqs. J2HH|) with 
m(0) = N, i?t(0) = as initial condition and increment- 
ing R t at the constant rate v. The simulation is stopped 
when all bases have translocated (m = 0) . From the time 
trace m(t), we calculate the force-extension curve f(R) 
using 

Q 

(/) = g^-^c+ sp Hn g (Rt = vt; N-m(t)) (6) 

and (R) = vt—(f)/\. Here, (/) and (R) are both thermal 
averages over the polymer and spring degrees of freedom 
at fixed total extension R t and fixed basepairing pattern. 
Extraction of parentheses positions from FEC. For 
every point on a FEC, we determine the length n of the 
freely jointed chain whose FEC passes closest to the point 
(using the polymer parameters for single-stranded RNA 
as given above). We take a histogram of the resulting 
lengths n over three independent FEC's for each struc- 
ture. In this histogram, the lengths n that correspond 
to start positions of stably basepaired regions appear as 
peaks, since the length of single-stranded RNA on the 
trans side remains approximately constant while the force 
required to unzip the basepairs builds up. [A similar pro- 
cedure was applied in Ref. ji^ to identify the positions 
of proteins bound to double-stranded DNA as it is be- 
ing unzipped.] We keep all n- values where the histogram 
exceeds a threshold of 30 counts (a count is made every 
Monte Carlo time-step). Since thermal noise makes the 
molecule fluctuate back and forth by a few bases while 
the force is building up for the next stem to open, we pick 
out of each contiguous stretch in the remaining n-values 
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only the largest. Finally, we increment the extracted ri- 
valries by one and mark the corresponding position in the 
sequence with a parenthesis. 

Reconstruction of basepairing pattern. The FEC's 
do not reveal which opening and closing parentheses are 
paired with each other. However, given the sequence 
of the RNA, we can match the parentheses by sequence 
complementarity. [To keep the number of false basepair 
predictions to a minimum, we consider only stems where 
we have at least one parenthesis at each end.] Here, we 
summarize the essential steps in our sequence alignment 
algorithm, while a detailed presentation and characteri- 
zation will be given elsewhere (R. Bundschuh and U. Ger- 
land, to be published): First, we find all possible gap- 
less local alignments between a subsequence containing 
a parenthesis and subsequences to the open side of the 
parenthesis, using the scoring scheme 2 for GC, 1 for AU, 
and for GU. We keep only those alignments with a score 
larger than 5 and where the matching sequence segment 
also contains a matching parenthesis. We consider the 
remaining alignments as possible stems in the secondary 
structure. To pick the most likely set of mutually consis- 
tent stems, we assign an alignment E- value to each stem 
[47| . We then iteratively include the most likely stem 
into the structure prediction, and remove all other stems 
it excludes due to overlapping basepairs from the list of 
allowed stems. 
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